{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ac1e281c",
   "metadata": {},
   "source": [
    "# Video Q&A LLM (Gemini)\n",
    "\n",
    "- Author: [Youngin Kim](https://github.com/Normalist-K)\n",
    "- Design: [Teddy](https://github.com/teddylee777)\n",
    "- Peer Review :\n",
    "- Proofread : [frimer](https://github.com/brian604)\n",
    "- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)\n",
    "\n",
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/04-Model/12-Gemini-Video.ipynb)[![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/04-Model/12-Gemini-Video.ipynb)\n",
    "## Overview\n",
    "\n",
    "This tutorial demonstrates how to use the ```Gemini API``` to process and analyze video content. \n",
    "\n",
    "Specifically, it shows how to **upload a video file** using ```File API```, and then use a generative model to extract descriptive information about the video. \n",
    "\n",
    "The workflow utilizes the ```gemini-1.5-flash``` model to generate a text-based description of a given video clip.\n",
    "\n",
    "Additionally, it provides an example of Integrating the Gemini Model into a LangChain Workflow for Video Data, showcasing how to build a chain that processes and analyzes video content seamlessly within the LangChain framework.\n",
    "\n",
    "\n",
    "### Table of Contents\n",
    "\n",
    "- [Overview](#overview)\n",
    "- [Environment Setup](#environment-setup)\n",
    "- [Data Preparation](#data-preparation)\n",
    "- [Upload and Preprocess video using Gemini API](#upload-and-preprocess-video-using-gemini-api)\n",
    "- [Generate content (Gemini API)](#generate-content-gemini-api)\n",
    "- [Integrating the Gemini Model into a LangChain Workflow for Video Data](#integrating-the-gemini-model-into-a-langchain-workflow-for-video-data)\n",
    "- [File Deletion](#file-deletion)\n",
    "\n",
    "### References\n",
    "\n",
    "- [Gemini API (Cookbook) - Video](https://ai.google.dev/api/generate-content#video)\n",
    "- [LangChain Components - google_generative_ai](https://python.langchain.com/docs/integrations/chat/google_generative_ai/)\n",
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac867129",
   "metadata": {},
   "source": [
    "## Environment Setup\n",
    "\n",
    "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n",
    "\n",
    "**[Note]**\n",
    "- ```langchain-opentutorial``` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
    "- You can checkout the [```langchain-opentutorial```](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "637ef359",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture --no-stderr\n",
    "!pip install langchain-opentutorial"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "455a3298",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install required packages\n",
    "from langchain_opentutorial import package\n",
    "\n",
    "package.install(\n",
    "    [\n",
    "        \"langsmith\",\n",
    "        \"langchain\",\n",
    "        \"langchain_core\",\n",
    "        \"langchain_google_genai\",\n",
    "        \"google-generativeai\",\n",
    "    ],\n",
    "    verbose=False,\n",
    "    upgrade=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f14f575c",
   "metadata": {},
   "source": [
    "**API KEY Issuance**\n",
    "\n",
    "- Obtain an API KEY from the [link](https://makersuite.google.com/app/apikey?hl=en).\n",
    "\n",
    "**Important:**\n",
    "\n",
    "- The ```File API``` used in this tutorial requires ```API keys``` for authentication and access.\n",
    "\n",
    "- Uploaded files are linked to the cloud project associated with the ```API key```.\n",
    "\n",
    "Unlike other ```Gemini APIs```, the ```API key``` also grants access to data uploaded via the ```File API```, so it's crucial to store the ```API key``` securely."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "2dd02587",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Environment variables have been set successfully.\n"
     ]
    }
   ],
   "source": [
    "# Set environment variables\n",
    "from langchain_opentutorial import set_env\n",
    "\n",
    "set_env(\n",
    "    {\n",
    "        \"GOOGLE_API_KEY\": \"\",\n",
    "        \"LANGCHAIN_API_KEY\": \"\",\n",
    "        \"LANGCHAIN_TRACING_V2\": \"true\",\n",
    "        \"LANGCHAIN_ENDPOINT\": \"https://api.smith.langchain.com\",\n",
    "        \"LANGCHAIN_PROJECT\": \"Video-Q&A-LLM-Gemini\",\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "f6b1f465",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e416000c",
   "metadata": {},
   "source": [
    "## Data preparation\n",
    "\n",
    "license free video from pexels\n",
    "- author: [SwissHumanity Stories](https://www.pexels.com/ko-kr/@swisshumanity-stories-1686058/)\n",
    "- link: [SwissHumanity Stories's pexels](https://www.pexels.com/video/drone-footage-of-a-train-traveling-on-a-valley-8290926/)\n",
    "\n",
    "Please download the video and copy it to the ```./data``` folder for the tutorial"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "17c70def",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set video file name\n",
    "video_path = \"data/sample-video.mp4\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0e0275b",
   "metadata": {},
   "source": [
    "## Upload and Preprocess video using Gemini API\n",
    "\n",
    "Next, use the File API to upload the video file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "ed380475",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Uploading files...\n",
      "Upload complete: https://generativelanguage.googleapis.com/v1beta/files/ycq94nkeb9gd\n"
     ]
    }
   ],
   "source": [
    "import google.generativeai as genai\n",
    "\n",
    "print(\"Uploading files...\")\n",
    "\n",
    "# Upload the file and return the file object\n",
    "video_file = genai.upload_file(path=video_path)\n",
    "\n",
    "print(f\"Upload complete: {video_file.uri}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf8eea99",
   "metadata": {},
   "source": [
    "After uploading the file, you can call ```get_file``` to verify if the API has successfully processed the file.\n",
    "\n",
    "```get_file``` allows you to check the uploaded file associated with the File API in the cloud project linked to the API key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "4f7e891f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Please wait while the video upload and preprocessing are completed...\n",
      "\n",
      "Video processing is complete!\n",
      "You can now start the conversation: https://generativelanguage.googleapis.com/v1beta/files/ycq94nkeb9gd\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "# Videos need to be processed before you can use them.\n",
    "while video_file.state.name == \"PROCESSING\":\n",
    "    print(\"Please wait while the video upload and preprocessing are completed...\")\n",
    "    time.sleep(5)\n",
    "    video_file = genai.get_file(video_file.name)\n",
    "\n",
    "# Raise an exception if the processing fails\n",
    "if video_file.state.name == \"FAILED\":\n",
    "    raise ValueError(video_file.state.name)\n",
    "\n",
    "# Print completion message\n",
    "print(\n",
    "    f\"\\nVideo processing is complete!\\nYou can now start the conversation: {video_file.uri}\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f930a229",
   "metadata": {},
   "source": [
    "## Generate content (Gemini API)\n",
    "\n",
    "After the video is preprocessed, you can use the ```generate_content``` function from Gemini API to request questions about the video."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "4bea99f4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Here's a description of the video clip:\n",
      "\n",
      "The video shows an aerial, high-angle view of a red passenger train traveling along a railway line that runs parallel to a road through a picturesque valley. \n",
      "\n",
      "\n",
      "Here's a breakdown of the scene:\n",
      "\n",
      "* **The Train:** A long, red passenger train is the central focus, moving from the bottom to the middle of the frame.  It's a fairly modern-looking train.\n",
      "\n",
      "* **The Valley:** The valley is lush green, with fields dotted with yellow wildflowers (likely dandelions).  The grass is vibrant and appears to be well-maintained pastureland. Several farmhouses and buildings are scattered throughout the valley. A small stream or river meanders alongside the road and tracks.\n",
      "\n",
      "* **The Mountains:** Towering mountains, partially snow-capped, form a dramatic backdrop. The mountains are steep and rocky, showcasing a mix of textures and shades of green and grey.\n",
      "\n",
      "* **The Atmosphere:** The overall atmosphere is peaceful and idyllic, with clear blue skies and abundant sunlight suggesting a pleasant spring or summer day.\n",
      "\n",
      "The video appears to be drone footage, smoothly following the train's progress through the valley. The camera angle provides a sweeping perspective that showcases the beauty of the landscape and the integration of the train within the environment. The entire scene evokes a sense of serene beauty and the charm of rural Switzerland.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Prompt message\n",
    "prompt = \"Describe this video clip\"\n",
    "\n",
    "# Set model to Gemini 1.5 Flash\n",
    "model = genai.GenerativeModel(model_name=\"models/gemini-1.5-flash\")\n",
    "\n",
    "# request response to LLM\n",
    "response = model.generate_content(\n",
    "    [prompt, video_file], request_options={\"timeout\": 600}\n",
    ")\n",
    "# print response\n",
    "print(response.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06e4c1e5",
   "metadata": {},
   "source": [
    "Below is an example of stream output (with the ```stream=True``` option added)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "ddaa6585",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "That's a narrow-gauge railway train.  More specifically, it appears to be a type of railcar used on the  Appenzell Bahn (AB) in Switzerland.  The train is primarily red in color, with some black and white accents.\n"
     ]
    }
   ],
   "source": [
    "# Prompt message\n",
    "prompt = \"What type of train is shown in this video, and what color is it?\"\n",
    "\n",
    "# Set model to Gemini 1.5 Flash\n",
    "model = genai.GenerativeModel(model_name=\"models/gemini-1.5-flash\")\n",
    "\n",
    "# request stream response to LLM\n",
    "response = model.generate_content(\n",
    "    [prompt, video_file], request_options={\"timeout\": 600}, stream=True\n",
    ")\n",
    "\n",
    "# print stream response\n",
    "for chunk in response:\n",
    "    print(chunk.text, end=\"\", flush=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9baa26d",
   "metadata": {},
   "source": [
    "## Integrating the Gemini Model into a LangChain Workflow for Video Data\n",
    "\n",
    "Here is an example of using LangChain with the Gemini model.\n",
    "\n",
    "The model is loaded via ```ChatGoogleGenerativeAI``` from ```langchain_google_genai```, allowing multimodal data to be included in the content of ```HumanMessage``` using the ```media``` type."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "dea7aa3f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This\n",
      " video shows an aerial view of a red train traveling along a railway line that runs\n",
      " parallel to a road through a picturesque valley in what appears to be the Swiss Alps\n",
      ". \n",
      "\n",
      "\n",
      "Here's a breakdown of the content:\n",
      "\n",
      "* **Scenery:** The valley is lush green, with fields dotted with yellow wildflowers, likely\n",
      " dandelions.  The valley is surrounded by steep, verdant hillsides and majestic snow-capped mountains in the background. A small stream or river runs\n",
      " alongside the road and railway.  Several farmhouses or chalets are scattered throughout the valley.  The overall impression is one of idyllic rural Switzerland.\n",
      "\n",
      "* **Transportation:** A long red train is the central focus, moving steadily along the\n",
      " railway tracks. The train appears to be a passenger train, given its length and the typical design.  A road runs parallel to the tracks, offering a contrasting mode of transportation.\n",
      "\n",
      "* **Camera Work:** The video is shot from a\n",
      " drone, providing a high-angle, sweeping perspective. The camera follows the train as it moves through the valley, giving viewers a sense of the scale and beauty of the landscape. The drone maintains a relatively constant distance and speed to follow the train.\n",
      "\n",
      "* **Overall Impression:** The video is visually stunning and evokes a\n",
      " sense of tranquility and the beauty of nature in a mountainous region.  It's a perfect example of promotional footage for tourism, showcasing Switzerland's landscapes and transportation infrastructure.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from langchain_google_genai import ChatGoogleGenerativeAI\n",
    "from langchain_core.messages import HumanMessage\n",
    "\n",
    "# Initialize the Gemini model with the specified version\n",
    "llm = ChatGoogleGenerativeAI(model=\"gemini-1.5-flash\")\n",
    "\n",
    "# Create a message to send to the model and attach the video file as media input\n",
    "message = HumanMessage(\n",
    "    content=[\n",
    "        {\"type\": \"text\", \"text\": \"Please analyze the content of this video.\"},\n",
    "        {\n",
    "            \"type\": \"media\",\n",
    "            \"mime_type\": video_file.mime_type,\n",
    "            \"file_uri\": video_file.uri,\n",
    "        },\n",
    "    ]\n",
    ")\n",
    "\n",
    "# Stream the response and process each chunk\n",
    "for chunk in llm.stream([message]):\n",
    "    print(chunk.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a3a5b40",
   "metadata": {},
   "source": [
    "## File Deletion\n",
    "\n",
    "Files are automatically deleted after 2 days, or you can manually delete them using files.delete()."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "bf71d026",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The video has been deleted: https://generativelanguage.googleapis.com/v1beta/files/ycq94nkeb9gd\n"
     ]
    }
   ],
   "source": [
    "# File deletion\n",
    "genai.delete_file(video_file.name)\n",
    "\n",
    "print(f\"The video has been deleted: {video_file.uri}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "langchain-kr-lwwSZlnu-py3.11",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
